JMIR Public Health and Surveillance — Latest Matching Preprints

1

Comfort with AI for HIV Prevention Among Cisgender Women in New York City

Reyes Nieva, H.; Flanagan, M.; Huang, S.; Theodore, D. A.; Nkodo, A. F.; Parkinson, M.; Hill, S.; McAndrew, M.; Benitez, J. A.; Peralta, H.; Amesty, S.; Zucker, J. E.; Sobieszczyk, M.; Castor, D.

2026-06-03 health informatics 10.64898/2026.06.02.26354471 medRxiv

Top 0.1%

18.3%

Show abstract

Background: Long-acting pre-exposure prophylaxis (PrEP) expands HIV prevention options for women. However, PrEP impact depends on addressing persistent gaps in awareness, access, and use. Artificial intelligence (AI) tools, including conversational agents, are being explored to advance PrEP uptake, but comfort with AI may influence their impact. Thus, we examined women's comfort with AI and its association with PrEP awareness. Methods: We analyzed self-reported data from women aged [≥]18 years in a cross-sectional survey conducted in New York City from August 2023 to August 2024. We performed descriptive analyses, applied latent class analysis to identify AI knowledge/comfort profiles, and estimated unadjusted and adjusted odds ratios to assess associations between profile membership and PrEP awareness. Results: Among 306 respondents without a diagnosis of HIV who completed AI-related survey items, the median age was 36. Most women identified as Hispanic/Latina (60%) or Non-Hispanic Black (18%), had not completed college (53%), and spoke only English or were bilingual (81%). Latent class analysis identified four AI knowledge/comfort profiles that differed by PrEP awareness, race/ethnicity, borough, prior drug use, and technology utilization. Women with varied AI knowledge, broad AI discomfort, and comfort with clinicians maintaining privacy had lower odds of PrEP awareness (OR: 0.35, 95% CI: 0.16-0.75), but this association did not persist after statistical adjustment. Conclusions: PrEP awareness and AI knowledge were limited, yet many women expressed openness to AI-enabled tools when privacy was assured. AI-enabled HIV prevention tools should prioritize trust, transparency, confidentiality, and the lived contexts of the women they intend to serve.

2

Analytical Centralization of Health Expenditure at the National Administrator of Health System Resources: Architecture, Data Quality, and Operational Performance of the ADRES Health System Analytics Platform, Colombia

Garavito Jimenez, D. A.; Bello Angulo, D. E.; Mejia Lemus, L. T.; Chipatecua, D.; Fula, D. D.; Perez-Rubiano, S.; Martinez, F. L.; Bohorquez Pinzon, J. C.

2026-06-10 public and global health 10.64898/2026.06.08.26355159 medRxiv

Top 0.1%

8.2%

Show abstract

Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS -- Registro Background Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded RIPS records (FEV-RIPS) as the standard for financial and clinical data exchange. ADRES -- the entity responsible for administering the resources of Colombia's General Social Security Health System -- faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers. Health systems in high-income countries converge clinical-financial data in consolidated platforms; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and no centralized analytical infrastructure until 2023. Objective We describe the design, technical challenges of integrating heterogeneous data, and operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates. Methods Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January-November 2025. We report indicators of data volume, processing speed, computational capacity, concurrent use by functional group, and governance structure. The architecture integrates VPN connectivity with MinSalud, automated processing of multiple formats (XML, relational tables, flat files), and a medallion data lake (Bronze/Silver/Gold). Data quality challenges include structural inconsistencies across sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and absent technical documentation. Results The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds using clusters of up to 32 TB RAM and 4,096 vCPU. During September-October 2025, monthly query peaks reached 78,028 across eleven functional groups. Integration required Python/PySpark parsers for variable-depth XML, equivalence tables for incompatible municipality codes, cleaning routines for extreme dates used as nulls (1900-01-01, 9999-12-31), and transformation logic bridging classic RIPS and FEV-RIPS. The platform supported econometric analyses, judicial mandate responses, and public interactive dashboards. Conversational AI integration (Genie, Copilot) extends analytical access to users without SQL knowledge. Conclusions ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. Centralizing health system information at national scale is technically feasible under public institutional constraints -- but requires solving cross-source standardization problems the implementation literature does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges.

3

A Deep Learning-Based Predictive Algorithm for Metabolic Syndrome Detection in the U.S. Population

Correa Segade, C.; Solozabal, R.; Hammouri, Z. A. A.; Gomez-Peralta, F.; Rossman, H.; Vidal, J. C.; Klonoff, D. C.; Segal, E.; Matabuena, M.

2026-06-02 endocrinology 10.64898/2026.05.24.26354007 medRxiv

Top 0.1%

8.2%

Show abstract

Objective To develop clinically operational, population-representative risk-score models for detecting metabolic syndrome (MetS) in U.S. adults by incorporating the NHANES survey design. Research Design and Methods We analyzed 36,812 U.S. adults from NHANES 1988--2018. Seven models of increasing clinical complexity were trained and evaluated, ranging from basic demographics to full biochemical panels. We used a new deep-learning methodology for survey data with a predictive uncertainty quantification model. Results A model combining anthropometrics, vital signs and a basic lipid panel achieved an AUC of 0.923 at an estimated cost of 0.40 eur per individual. Adding diabetes-specific biomarkers, including fasting plasma glucose (FPG) and glycated hemoglobin (HbA1c), yielded only marginal improvements. Conclusions This low-cost population-representative screening tool for MetS may help identify at-risk individuals and support data-driven public health interventions.

4

Stigmatizing Language Detection in Opioid Use Disorder Patient-Directed Discharge Clinical Documentation: A Privacy-Preserving Analysis Using a Locally Deployed Large Language Model

Izzo, J. A.; McIntyre, A. M.; Nguyen, J.; Bashaw, D.; Torrance, C. A.; Foster, J.

2026-06-01 health informatics 10.64898/2026.05.29.26354402 medRxiv

Top 0.1%

7.1%

Show abstract

Objective: Stigmatizing language in the electronic health record (EHR) has been associated with adverse patient experience in substance use disorder care, including opioid use disorder (OUD). This study evaluated a privacy-preserving, locally-deployed large language model as a method to detect stigmatizing language documentation in OUD patients with patient-directed discharge (PDD). Methods: A retrospective cohort study of 477 inpatient admissions from the MIMIC-IV database with a diagnosis of opioid use disorder were classified using a locally deployed Gemma-4-31b-it-bf16 model and predefined 140 term lexicon to identify stigmatizing language in clinical documentation. Results: Analysis of clinical documentation showed stigmatizing language was present in 84.1% (190/226) in the PDD cohort vs 62.2% (156/251) in the non-PDD cohort, with an unadjusted odds ratio of 3.21 (95% CI 2.07-4.98; p < 0.0001). After adjustment for age, sex, insurance status, marital status, and race, PDD discharge remained an independent predictor of stigmatizing documentation (aOR 2.24, 95% CI 1.40-3.59; p < 0.0001). Further analysis of stigma intensity showed higher stigmatizing markers in the PDD cohort vs the non-PDD cohort (2.85 {+/-} 2.39 vs 2.02 {+/-} 2.44; p < 0.0001). Discussion and Conclusion: Stigmatizing language is detected with increased frequency and prevalence in clinical documentation of OUD patients that initiate PDD compared to those that adhere to standard discharge processes. A locally deployed large language model (LLM) offers a scalable, privacy-preserving method to audit clinical documentation for stigmatizing language.

5

Variation in Telehealth Use in a National Home Test-to-Treat Program for Acute Respiratory Infections

Losos, W.; Wang, B.; Fisher, K.; O'Connor, L.; Soni, A.; Gerber, B.

2026-05-26 health informatics 10.64898/2026.05.24.26353984 medRxiv

Top 0.1%

7.0%

Show abstract

Background Home Test-to-Treat (HTTT) programs deliver timely antiviral treatment for acute respiratory infections, including COVID-19 and influenza, through at-home testing and telehealth. Because access is often measured by visit occurrence, variation in how and when care is delivered may be overlooked. We hypothesized that telehealth access follows distinct process-based patterns. Methods We analyzed de-identified encounters from the national HTTT program (September 2023-July 2024); 6,213 of 8,160 eligible individuals remained after exclusions for missing data. Phenotypes were derived by k-means clustering of standardized variables capturing encounter timing, modality preference, process duration, and sociodemographic and digital access attributes. Ten-day surveys assessed symptom duration and healthcare utilization. Results Three phenotypes emerged: Delayed/Disrupted Access (n = 1,537; 24.7%), Digitally Engaged but Socioeconomically Vulnerable (n = 1,460; 23.5%), and Mainstream Access and Efficient Utilization (n = 3,216; 51.8%). Mean process duration differed (15.93 [SD 3.84] vs 3.69 [3.31] vs 2.87 [2.41] hours; p < 0.001). Synchronous preference was lowest in the Digitally Engaged group (22.9%); antiviral prescribing was high (88.6%-91.9%). Among 10-day respondents (n = 1,023), symptom duration did not differ. Emergency department visits were most frequent in the Digitally Engaged group (2.3% vs 0.0% and 0.5%; p = 0.02) and urgent care in the Delayed/Disrupted group (5.8% vs 4.1% vs 2.0%; p = 0.02). Conclusions Telehealth use in a national HTTT program formed distinct phenotypes defined by timing, modality, and care-process efficiency. Evaluating equity requires attention to how and when care is delivered, not simply whether it occurred.

6

A Prospective Observational Study on a Multimodal Non-Invasive Physiological Monitoring System (Hayl): Feasibility, Signal Characterization, and Exploratory Biomarker Correlation

Choda, G.; Choda, A.

2026-05-17 endocrinology 10.64898/2026.05.13.26353115 medRxiv

Top 0.2%

6.4%

Show abstract

Chronic conditions such as Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN) remain underdiagnosed in community settings, particularly in resource-limited populations. Conventional diagnostic approaches rely on episodic measurements and laboratory-based assessments, limiting scalability for large-scale screening. Non-invasive physiological monitoring systems offer a potential pathway for accessible and rapid wellness assessment in real-world environments. This study aimed to evaluate the feasibility, signal acquisition performance, and exploratory physiological signal characteristics of a non-invasive multimodal monitoring system (Hayl) in community-based screening settings. Methods: A prospective, cross-sectional, multicenter observational pilot study was conducted across rural and urban screening camps in south India. A total of 281 adult participants were enrolled, including individuals with known T2DM, HTN, and those without known comorbidities, encompassing both symptomatic and asymptomatic subjects. Physiological data were acquired using the Hayl system, which integrates photoplethysmography (PPG) and temperature sensing. Signal acquisition feasibility, waveform quality, and derived signal characteristics were evaluated. Comparative and exploratory analyses were performed across predefined clinical subgroups. The study was conducted under Institutional Ethics Committee approval in accordance with guidelines from the Indian Council of Medical Research. Conclusion: The Hayl system demonstrated high feasibility for physiological signal acquisition, with successful PPG recordings in 274 participants (97.5%) and temperature signals in 279 participants (99.3%). Most recordings exhibited high waveform quality (74.0%), with observable variations in signal characteristics across clinically relevant subgroups. Reduced pulse variability and increased waveform irregularity were more frequently observed in participants with T2DM and HTN, while symptomatic individuals demonstrated greater signal variability compared to asymptomatic participants. Temperature measurements were stable, with a mean peripheral temperature of 33.4 with a variation of 1.2C degrees. These findings support the potential of Hayl as a non-invasive multimodal platform for community-based wellness screening and exploratory signal-based physiological assessment. Further large-scale and longitudinal studies are required to establish clinical utility.

7

Development and validation of a dynamic risk stratification tool for predicting multidrug-resistant bacterial infections in ICU patients: A clinical prediction model and web-based calculator

Ye, L.; Lyu, B.; Yang, Q.; Mou, X.; Nawawonganun, R.; Laohasiriwong, W.

2026-05-26 intensive care and critical care medicine 10.64898/2026.05.23.26353927 medRxiv

Top 0.2%

6.3%

Show abstract

Background: Multi-drug resistant Bacterial (MDRB) Infections in the intensive care units (ICUs) substantially elevate patient mortality, prolong hospital stays, and impose heavy healthcare cost burdens. Existing predictive models for ICU-acquired MDRB infection predominantly focus on static admission-risk assessment, lacking the capacity to leverage longitudinal treatment data for dynamic risk re-stratification during the ICU stay. Meanwhile, most models suffer from poor clinical interpretability, overreliance on hard-to-collect biomarkers, or absence of deployable clinical tools, limiting real-world translation. Therefore, there is an urgent need to develop a parsimonious, interpretable tool based on routine cumulative data to guide timely intervention. This study aimed to develop a interpretable model with a web calculator to improve clinical applicability. Methods: In this study, we conducted a retrospective analysis of ICU inpatients at the First Affiliated Hospital of Dali University between January 1, 2023, and January 1, 2026. Using the create Data Partition function in R software (random seed = 42), the dataset was stratified and divided into a training group and a validation group in a 7:3 ratio. Feature selection was performed using the Boruta algorithm to validate variable rationality. A multivariable logistic regression model was constructed and visualized as a nomogram, and its performance was compared with six machine learning algorithms (Random Forest, XG Boost, Neural Network, etc.). Model validation was conducted using receiver operating characteristic curves (ROC), Decision Curve Analysis (DCA), and SHAP value interpretation. Finally, an online R Shiny calculator was developed based on the final model. Results: A total of 3,631 patients were enrolled and divided into a training group (n=2,543) and a validation group (n=1,088) using stratified random sampling. Five independent predictors were identified in the training group, which were hypertension combined with diabetes, antibiotic types, ventilator days, urinary catheter days, and PCT abnormality times. The Logistic regression model achieved an AUC of 0.772 (95%CI: 0.733-0.812) in the validation group, outperforming XG Boost (0.763) and Random Forest (0.703). The model demonstrated excellent calibration (Hosmer-Leme show {chi}{superscript 2} = 1.94, P = 0.9829) and positive net clinical benefit across threshold probabilities of 0%-40%. SHAP analysis aligned with regression-derived variable importance rankings, confirming predictor contributions. An open-access online calculator was successfully deployed (https://dongfangshao666.shinyapps.io/MDR_shiny2/), enabling real-time individualized risk stratification at the bedside. Conclusion: This study developed and validated a dynamic, interpretable multi-drug-resistant bacterial infection risk prediction model requiring only five routinely collected clinical indicators. The model balances robust predictive performance with high transparency, overcoming key limitations of prior tools. The accompanying web calculator supports dynamic risk reassessment throughout the ICU stay, facilitating precise antimicrobial stewardship, targeted infection control interventions, and optimized resource allocation, bridging the gap between statistical modeling and frontline clinical decision-making.

8

The SARS-CoV-2 Integrated Genomic Epidemiology Database (IGED): Linking viral genomes with patient-level metadata to advance statewide genomic surveillance in California

Ryder, R.; Elder, J.; Panditrao, M.; Grosgebauer, K.; Katz, R.; Tello, L.; Carroll, E.; Borthwick, D.; Kaur, C.; Smith, R.; Shiau, V.; Wheeler, W.; Reilly, E.; Myers, J.; Nelson, L.; Lim, E.; Arunleung, P.; Baylis, E.; Gilliam, S.; Hennesy-Burt, T.; Bregman, B.; Silver, E.; Kapsak, C.; Wright, S.; Leon, T.; Bell, J.; Morales, C.; Wadford, D. A.

2026-05-19 health informatics 10.64898/2026.05.14.26353263 medRxiv

Top 0.2%

6.3%

Show abstract

In July 2021, the California Code of Regulations Title 17 required all laboratories performing SARS-CoV-2 whole genome sequencing (WGS) to report their sequencing results to the California Department of Public Health (CDPH). These viral genomic data and patient metadata were compiled into the Integrated Genomic Epidemiology Database (IGED). Linking anonymized viral sequences with patient-level information enabled monitoring of infectiousness, pathogenicity, transmission dynamics, evolution, and vaccine evasion among emerging SARS-CoV-2 lineages. Laboratories performing SARS-CoV-2 WGS transmitted sequencing results to CDPH through Electronic Laboratory Reporting (ELR) and non-ELR pathways. CDPH applied uniform reporting requirements but allowed flexibility in specific data formats to accommodate diverse data systems. To preserve data quality and interoperability across heterogeneous sources, CDPH implemented standardization, validation, and deduplication protocols. Snowflake, a cloud-based data storage and analytics platform, and Posit Connect, a cloud deployment and automation platform, supported the management, processing, and integration of data within the IGED. The IGED established links between SARS-CoV-2 WGS data and epidemiologic metadata for 801,418 sequences, representing 81.7% of all sequences reported in California. Lineages reported to the IGED showed strong concordance with lineage proportions in GISAID. Sequences reported to the IGED had average turnaround times longer than one month, and the majority of sequencing was performed in Southern California and Los Angeles. The IGED enhanced genomic surveillance through predictive modeling and monitoring concerning evolutionary trends such as recombination and saltations in persistent infections. Development of the IGED highlighted the need for standardized data requirements, sustained funding for sequencing, incentives for data submission, and interdisciplinary collaboration to build an effective genomic surveillance system. This framework for linking genomic and epidemiologic data has not only generated critical insights for SARS-CoV-2 but also provided the foundation for CDPH and other public health organizations to develop similar IGED-like systems for other priority pathogens as genomic surveillance expands.

9

Tobacco use and determinants among adults with non-communicable diseases: Evidence from the 2017 Zambia STEPS survey

BWALYA, C.; MOONGA, G.; MWIINDE, A. M.; BERG, C.; SILUMBWE, A.; ZYAMBO, C.

2026-05-19 public and global health 10.64898/2026.05.15.26353278 medRxiv

Top 0.3%

4.9%

Show abstract

Background: Non-communicable diseases (NCDs) account for approximately 75% of global deaths, with 79% occurring in low- and middle-income countries. Tobacco use remains a major modifiable risk factor, contributing to more than 8 million deaths annually. In Zambia, evidence on tobacco use among individuals with hypertension, diabetes mellitus, and cardiovascular disease remains limited. This study assessed the prevalence and determinants of tobacco use among adults with NCDs in Zambia. Methods: We conducted a secondary analysis of the 2017 Zambia STEPS survey. The analytic sample included 716 adults aged 18-69 years with self-reported hypertension, diabetes, and/or cardiovascular disease. Tobacco use was defined as current smoking or smokeless tobacco use. Multivariable logistic regression was used to estimate adjusted odds ratios (AORs), accounting for the complex survey design. Results: Among 716 participants, 65.5% had hypertension, 7.7% diabetes, and 26.8% cardiovascular disease; 89.5% had multimorbidity. The overall prevalence of tobacco use was 12.2%. Prevalence was 12.2% among those with hypertension, 5.5% among those with diabetes, and 14.1% among those with cardiovascular disease. Tobacco use was significantly higher among males. Female sex was associated with lower odds of tobacco use (AOR = 0.16, 95% CI: 0.05-0.54, p = 0.004). Secondary education (AOR = 0.15, 95% CI: 0.03-0.66) and higher education (AOR = 0.04, 95% CI: 0.01-0.44) were protective. Alcohol consumption increased the odds of tobacco use (AOR = 5.23, 95% CI: 1.17-23.28). Conclusion: Tobacco use remains common among adults with NCDs in Zambia. Integration of tobacco cessation interventions into routine NCD care is urgently needed.

10

Domain-based basal and ambulatory glycemic exposure metrics derived from continuous glucose monitoring: a real-world clinic-based study

Shinde, S. N.; Shinde, R. S.; Bhangaaley, S. Y.

2026-05-26 endocrinology 10.64898/2026.05.24.26353983 medRxiv

Top 0.3%

4.9%

Show abstract

Background: Consensus continuous glucose monitoring (CGM) metrics, including time in range (TIR), time above range (TAR), time below range (TBR), mean glucose, glucose management indicator, and glycemic variability, are essential for modern glucose assessment. However, these whole-day summaries do not explicitly partition nocturnal basal from daytime ambulatory glycemic burden. Objective: To develop and evaluate a complementary domain-based CGM framework that quantifies basal and daytime ambulatory glycemic exposure across oral glucose tolerance test (OGTT)-derived dysglycemia phenotypes. Methods: In this observational, clinic-based study, 253 individuals underwent OGTT with insulin measurement and CGM. Participants were classified using a prespecified OGTT-derived phenotyping algorithm, implemented through a deterministic rules-based web calculator, and collapsed into five groups: NoDM, Increased insulin resistance, Midzone Glycemia, Prediabetes, and Diabetes. CGM files were uniformly reprocessed by selecting the latest contiguous episode and retaining the most recent 15 calendar days with data. The 24-hour profile was partitioned into nocturnal basal (00:00 to <06:00) and daytime ambulatory (06:00 to <24:00) domains. Derived indices included Area of Basal Glycemia (ABG), Area of Prandial/Daytime Ambulatory Glycemia (APG), incremental ABG (iABG), incremental APG (iAPG), and exploratory deficit indices dABG and dAPG. Results: The final dataset contributed 3,647 analyzable CGM days. APG remained higher than ABG across all groups. Mean ABG/APG increased from 80.45/86.38 mg/dL in NoDM to 111.96/124.70 mg/dL in Diabetes. Mean iABG/iAPG increased from 5.65/6.60 to 34.12/38.91 mg/dL, whereas dABG/dAPG declined as dysglycemia worsened. Conclusions: The ABG/APG framework provides interpretable, domain-resolved CGM burden metrics that separate basal from daytime ambulatory exposure and distinguish total burden from above-threshold excess. These indices are proposed as adjunctive metrics to support dysglycemia phenotyping, early risk recognition, and treatment monitoring, but are not intended to replace established consensus CGM metrics or diagnostic criteria. External, prospective validation is required.

11

An AI-Powered Smartphone Application for Universal and Standardized Reading and Interpretation of Lateral Flow Assays

Bermejo-Pelaez, D.; Darias, O.; Pastor, L.; Valles, R.; Diez, N.; Lin, L.; Garcia-Villena, J.; Cuadrado, D.; Vladimirov, A.; Alamo, E.; Postigo, M.; Rodriguez-Dominguez, M.; Canton, R.; Rodriguez-Tudela, J. L.; Alastruey Izquierdo, A.; Bohorquez, L. C.; Rubio, J. M.; Dacal, E.; Luengo-Oroz, M.

2026-05-18 public and global health 10.64898/2026.05.14.26352875 medRxiv

Top 0.3%

4.4%

Show abstract

Introduction. Lateral flow assays (LFAs) are indispensable rapid diagnostic tools in healthcare, enabling point-of-care diagnosis critical for patient management and support disease burden assessment and surveillance when results are properly recorded. However, misinterpretation errors and unreported cases remain a concern. A quality-assured, affordable Ai-powered tool, supporting the decision-making during result interpretation could promote proper disease monitoring and epidemiological surveillance. Here, we describe the performance of a universal AI model to digitize and interpret results from multiple LFA types through a smartphone application, a step that could ultimately enable standardized and digitally reportable test outcomes. Methods. The AI algorithm was evaluated in 17 LFA types, including both 2-band and 3-band tests for different diseases and manufacturers. The model was trained on a dataset of 22,576 images captured under diverse lighting conditions with different smartphone models and using a custom mobile application, TiraSpot (Spotlab, Madrid, Spain). To assess generalizability, a leave-one-out cross-validation was applied, where in each LFA type was iteratively excluded from training and used for testing. Model performance was evaluated using bootstrapping on the inference dataset. Results. In the assessment of the model's ability to generalize to new LFA types not previously analyzed (not included during development), the model achieved an overall AUC of 94.3% for second band detection. This overall performance was enhanced to 99.3% (Sensitivity=98,6%; Specificity=98%) after training with 50 images of each LFA type, highlighting the benefit of additional data for specific LFA types. For the third band detection, where less training data was available, the system achieved an overall AUC of 83.9% for unseen LFAs, improving to 94.2% (Sensitivity=92.9%; Specificity=87,9%) after training with 50 images of each LFA type. Conclusion. This system demonstrates the feasibility of an AI-powered universal digital reader for interpreting LFA results from diverse test types using smartphone-captured images. Its compatibility with standard smartphones makes it a universal tool, enabling reliable LFA interpretation across devices and settings. By standardizing test interpretation and digitizing results, this tool could support decision making in result interpretation, enhancing epidemiological surveillance, particularly in resource-limited settings. Its adaptability across various infections highlights its potential to improve diagnostic consistency and support disease management in diverse healthcare settings.

12

End of Average. Understanding Overweight & Obesity: Rationale and Design.

Vanbrabant, E.; Roefs, A.; Goossens, G.; Lemmens, L.; Shapovalova, Y.; Hesen, J.; Mironiuc, C.

2026-06-08 primary care research 10.64898/2026.06.05.26354975 medRxiv

Top 0.4%

4.3%

Show abstract

Background: Obesity is globally recognized as a complex, multifactorial chronic disease, with biological, psychological, environmental and behavioural factors involved in both disease pathogenesis and maintenance. Although previous group-based studies demonstrated involvement of each of these factors, there is large inter-individual variability in the factors contributing to disease development as well as intervention outcomes, causing limited translatability to the individual level. This heterogeneity in treatment effectiveness might be due to differential causal and maintenance factors of obesity. To enable the transition from a one-size-fits-all approach to a more personalized approach for individuals with overweight or obesity, this study aims to investigate if and how the degree of weight loss and changes in daily life behaviour after a combined lifestyle intervention depend on individual baseline profiles comprising of person characteristics, biological, psychological, environmental and behavioural factors. Methods: This study will include 600 individuals varying in BMI, 200 participants with a healthy BMI (18.5-24.9kg/m2), 200 with overweight (BMI 25.0-29.9kg/m2), and 200 with obesity (BMI [≥]30.0kg/m2). For all participants, a comprehensive individual baseline profile is created, including person characteristics, biological, psychological, environmental and behavioural factors. A clustering method is applied to identify clusters of participants with similar characteristics. Next, we examine if and how these clusters are linked to bodyweight indicators measured at baseline, and how they relate to daily lifestyle behaviour, as measured by ecological momentary assessment (EMA) using a smartphone app and sensor technology (3-week measurements). Individuals with overweight or obesity will be randomized to the intensive lifestyle intervention or a lifestyle information condition, to determine if treatment response can be predicted based on cluster characteristics, how daily lifestyle behaviour changes after an intervention, and how changes in daily lifestyle behaviour relate to treatment response. Discussion: The End of Average study aims to characterize a large set of individuals varying in body weight to predict intervention effectiveness measured as changes in body weight indicators and in daily lifestyle behaviours. If reliable predictors of treatment success can be identified, these can be applied in personalized lifestyle interventions to improve lifestyle behaviour, body weight management and overall health.

13

Prescription intervals of medications for chronic use: a cohort study

Muddiman, R.; Donoghue, P.; Gomez Lemus, J.; Doherty, A. S.; Boland, F.; McCarthy, C.; Moriarty, F.

2026-06-09 primary care research 10.64898/2026.06.08.26355164 medRxiv

Top 0.4%

4.2%

Show abstract

Purpose In deprescribing studies, a prescription-free gap is typically used to determine if patients discontinued their treatment. An appropriate gap depends on the typical time between prescriptions during continued use. This work aims to characterise the interval between prescriptions of chronic drugs using different methods for a cohort of older people in primary care in Ireland. Methods The empirical prescription interval was analysed for 38,154 patients for the twenty most common drug classes and the association between covariates and the interval was analysed using a multi-level model. Estimates were also compared to those obtained from the parametric waiting time distribution (pWTD) approach. Results Available covariates had consistent relationships with prescription intervals across drug classes. For example, each additional prescription issue was associated with an increase in the interval by 5.0 (NSAIDs) to 19.7 days ("Other antidepressants"). Full public health cover was associated with a -29.0 day (inhaled adrenergics) to -11.0 day (opioids) change relative to partial cover, while other/private cover had a -17.9 day (benzodiazepines and associated drugs) to -7.1 day (SSRI and SNRIs) change relative to partial cover. The pWTD also produced consistent estimates of the population interval for most drugs. Conclusions The interval varied substantially within drug classes, due to a mixture of patient, practice and unmodelled factors. Variation between practices was effectively explained, with residual variation between patients and within patients. The pWTD approach is useful for describing complex distributions of intervals, and may be more appropriate for inferring a gap than summarising truncated data.

14

Sexually Transmitted and Bloodborne Infections, Methamphetamine Use, and COVID-19 Vaccination in Manitoba, Canada: A Retrospective Matched Cohort Analysis Using Population-Based Administrative Healthcare Data (2020-2022)

Shaw, S. Y. Y.; Mahar, A.; Bailey, K.; Payne, M.; Kindrachuk, J.; Kelly, C.; Friesen, K. J.; Bernstein, C. N.; Reimer, J.; Becker, M. L.; McClarty, L. M.; Stein, D.; Nickel, N. C.

2026-05-21 epidemiology 10.64898/2026.05.18.26353507 medRxiv

Top 0.5%

3.9%

Show abstract

Objectives: To examine COVID19 vaccine uptake among people diagnosed with sexually transmitted and bloodborne infections (STBBI) and reported methamphetamine users in Manitoba, Canada, during the acute phase of the COVID19 pandemic. Methods: We conducted a retrospective matched cohort study using linked population based administrative healthcare, laboratory, and vaccination databases in Manitoba. Individuals aged 16+ years with laboratory confirmed chlamydia/gonorrhea (CT/NG), syphilis, HIV, and/or documented methamphetamine use during the four years prior to March 1, 2020 were included in eight exposed cohorts. Each cohort was matched to unexposed comparators on age, sex, geographic region, and income quintile. The primary outcome was receipt of 2+ COVID19 vaccine doses between December 1, 2020 and March 31, 2022. Poisson regression models estimated adjusted rate ratios (aRRs) and 95% confidence intervals (95% CIs) for vaccine uptake. Results: Compared with matched comparators, most exposed cohorts were less likely to complete the COVID19 primary vaccine series. Individuals in the Syphilis Only (aRR: 0.87, 95% CI: 0.85 0.90), Syphilis Plus (aRR: 0.84, 95% CI: 0.81 0.86), CT/NG Only (aRR: 0.95, 95% CI: 0.94 0.96), CT/NG Plus (aRR: 0.82, 95% CI: 0.80 0.85), Methamphetamine Only (aRR: 0.78, 95% CI: 0.76 0.80), and Methamphetamine + STBBI cohorts (aRR: 0.74, 95% CI: 0.72 0.77) had significantly lower vaccine uptake. The HIV Only cohort did not differ significantly from matched comparators (aRR: 0.98, 95% CI: 0.95 1.01). Lower uptake was concentrated among individuals living in lower-income areas. Conclusions: People diagnosed with STBBI and methamphetamine users in Manitoba experienced significant inequities in COVID19 vaccine uptake, particularly those with STBBI coinfections and concurrent substance use. Integrated vaccination approaches linked with HIV, harm reduction, and addiction services may improve vaccine equity during future public health emergencies.

15

Combining centralized and decentralized approaches to assess and ensure data quality in Eurocrine(R) via Microsoft Power BI and DataquieR

Musholt, T. J.; Clerici, T.; Bergenfelz, A.; Schmidt, C. O.; Struckmann, S.

2026-06-05 health informatics 10.64898/2026.06.04.26354884 medRxiv

Top 0.5%

3.9%

Show abstract

Background: Medical registries have gained importance in the evaluation of healthcare quality outcomes. In the absence of high-quality evidence, such as randomized controlled trials, studies based on registry data are essential for informing clinical guidelines. Methods for assessing data quality are rarely described in detail. To ensure the credibility of registry-based studies, registries must use all available technical and operational means to guarantee high data quality. Method: Eurocrine(R) is a pan-European endocrine surgical database and quality registry initially funded by the EU healthcare programme, which started in 2015 and now includes more than 200,000 interventions as of April 2025. To ensure high data quality, interactive and standardized reports are created via Microsoft Power BI, which are created both centrally and locally. In addition, comprehensive data quality analyses were performed via the R-based package dataquieR. Results: Although a multitude of technical measures (for example, input screen design and real-time plausibility checks during data entry) are in place, they are not sufficient to prevent human errors at data entry. Errors identified in the reports were corrected, and preventive measures were implemented. Overall, the data quality was assessed as very good in terms of completeness, accuracy, and consistency. Conclusion: It is very important to provide registry users with an efficient and smart tool to identify data issues, as they have the clinical information to correct them. Data quality reports generated with dataquieR represent an effective tool for registry administrators. Predesigned Microsoft Power BI reports enable participating Eurocrine(R) clinics to self-audit their data.

16

Glycemic response trajectories on metformin monotherapy in real-world diabetes care

Raghavan, S.; Liu, W. G.; Ho, M. R.; Warsavage, T.; Ghosh, D.; Caplan, L.; Reusch, J. E.

2026-05-26 endocrinology 10.64898/2026.05.24.26353996 medRxiv

Top 0.5%

3.7%

Show abstract

Objectives: Diabetes affects over 500 million people globally and glycemia is inadequately managed. Metformin is the most frequently prescribed initial treatment for type 2 diabetes globally, yet glycemic response trajectories to metformin in routine real-world care and predictors of treatment response have not been well described. We aimed to identify glycemic response trajectories in adults prescribed metformin monotherapy as initial type 2 diabetes treatment and predictors of poor glycemic response to metformin. Design: Observational cohort study using latent class mixed models to identify hemoglobin A1c (HbA1c) trajectory classes, followed by random forests machine learning to predict trajectory class membership. Setting: US Veterans Affairs Healthcare System Participants: Adults treated with metformin alone for >30 days after diabetes diagnosis with a minimum of two HbA1c measurements from 90 days prior to two years after the first metformin prescription (N=140,413). Exposures: Demographic, laboratory, vital sign, and comorbidity data were included as predictors of metformin response trajectory Main Outcomes and Measures: We included all HbA1c measurements (487,604 total) for two years after metformin initiation to define metformin glycemic response trajectories. Results: We identified three HbA1c trajectories: stably low (89.7% of sample, mean HbA1c decrease from 7.2% to 6.6%), brisk response (7.1% of sample, mean HbA1c decrease from 11.4% to 7.0%), and non-response (3.1% of sample, mean HbA1c increase from 8.9% to 10.8%). Of those in the stably low and brisk response classes at 2 years, 91% maintained HbA1c at approximately 7% on metformin alone for 5 years after drug initiation. Prediction models could accurately predict brisk response (91% accuracy) but not metformin non-response (59% accuracy). Conclusions: Most individuals treated initially with metformin monotherapy have a beneficial and durable glycemic response. Predicting individuals who will not respond to metformin may be challenging but is evident within six months with recommended glycemic surveillance. The findings support current guidelines for HbA1c surveillance when initiating diabetes treatment.

17

Computational framework for the World Health Organization estimates of the global, regional and national burden of foodborne diseases 2026 edition

Devleesschauwer, B.; Vaes, L.; Fernandez, K.; Borghi, E.; Cao, B.; Fastl, C.; Jakobsen, L. S.; Kumapley, R.; Lake, R. J.; Majowicz, S. E.; Minato, Y.; Pires, S. M.; Mughini-Gras, L.; Nane, G. F.; Robertson, L.; Scallan Walter, E.; Torgerson, P. R.; Kretzschmar, M. E.; di Bari, C.

2026-05-17 public and global health 10.64898/2026.05.13.26353030 medRxiv

Top 0.5%

3.7%

Show abstract

Background Foodborne diseases cause substantial global morbidity and mortality, yet remain largely unattended. To support countries to address this public health concern, the World Health Assembly Resolution 73.5 called for strengthening global food safety efforts and led to the development of the WHO Global Strategy for Food Safety 2022-2030, adopted at the 75th WHA (2022). To this end, the World Health Organization (WHO) reconvened the Foodborne Disease Burden Epidemiology Reference Group (FERG) to advise and support the work to generate updated global, regional, and national estimates of the foodborne disease burden for the reference period 2000-2021. Methods We developed an incidence-based framework expanding coverage to 42 foodborne hazards. Standardized systematic reviews, Global Health Estimates and Global Burden of Disease envelopes, and United Nations population data informed the evidence base. Missing epidemiological data were imputed using Bayesian hierarchical meta-regression models. Disease models mapped acute and chronic health outcomes, applying updated disability weights, life tables, and probabilistic Monte Carlo calculations to estimate incidence, mortality, Years Lived with Disability, Years of Life Lost and Disability-Adjusted Life Years for all 194 WHO Member States. Transparency and analysis reproducibility were ensured through availed open-source R packages and standardized workflows. Results The computational framework provides annual, country-level estimates with improved internal consistency and an expanded hazard scope compared with the WHO 2015 edition. Advances include refined modelling, enhanced uncertainty propagation, and broader inclusion of microbial, parasitic, and chemical hazards. Persistent data gaps---especially in high-burden regions---were filled through extensive imputation. Conclusions The computational framework for the WHO 2026 edition delivers the most comprehensive and transparent assessment of the global burden of foodborne diseases to date. Despite remaining limitations, it enables routine monitoring, supports evaluation of global food safety efforts, and highlights priorities for strengthening national data systems.

18

Temporal Changes in Immunization Information Systems Across U.S. States and Jurisdictions, 2000-2024

Chen, T.; Watanabe, M.; Callaghan, T.; Shioda, K.

2026-06-02 health policy 10.64898/2026.05.29.26354476 medRxiv

Top 0.5%

3.7%

Show abstract

Background: Statewide immunization data are essential for monitoring vaccination trends and evaluating immunization program impact. In the United States, Immunization Information Systems (IIS) were established in the early 1990s to collect these data; however, operational, legal, and procedural details vary across states and over time. This study summarized differences in IIS characteristics, such as legal requirements and reporting procedures, across U.S. states and jurisdictions over time. Methods: We analyzed survey data from previous work in 2000 and the Centers for Disease Control and Prevention (CDC) in 2012, 2018, and 2024. Our review focused on legislation and reporting requirements for immunization registries across 50 states and 14 jurisdictions, including U.S. territories and Freely Associated States. Results: Between 2000 and 2024, legal frameworks and reporting practices for immunization registries expanded across U.S. states and jurisdictions. The number of states with laws or administrative rules authorizing immunization registries increased from 24 states in 2000 to all 50 states, the District of Columbia, five metropolitan areas, five U.S. territories, and three Freely Associated States in 2024. Over the same period, reporting requirements also became more widespread. The number of states and jurisdictions mandating providers to report immunization records increased from 12 in 2000 to 54 in 2024. Consent policies also changed over time. By 2024, most states and jurisdictions had adopted implicit consent for reporting children's immunization records (41; 64%), while a smaller proportion required explicit parental consent (7; 11%) or implemented mandatory reporting without consent (14; 22%). Discussion: IIS infrastructure and reporting requirements have expanded across U.S. states and jurisdictions over the past two decades, while heterogeneity in consent policies and reporting practices persists. These temporal changes may need to be considered when interpreting IIS data, particularly in longitudinal and cross-jurisdictional analyses.

19

Barriers and facilitators to mens engagement with digital mental health screening in Estonia: An interpretive qualitative study of user archetypes and design implications

Küüsvek, M.; Hallik, R.; Pajusalu, M.; Kuura, A.

2026-05-18 public and global health 10.64898/2026.05.12.26353064 medRxiv

Top 0.6%

3.6%

Show abstract

Background: Mental health issues are prevalent among men, yet help-seeking remains low due to stigma, masculinity norms and access barriers. Digital mental health (DMH) screening questionnaires offer opportunities for early detection, but their uptake among men is limited. Objective: This study explored the barriers and facilitators influencing mens willingness to use DMH screening questionnaires, with the aim of informing user-centered design that supports early detection and engagement. Methods: This interpretive qualitative study was conducted through semi-structured interviews with 17 purposively sampled Estonian men (aged 20-54) in a highly digitalized context until data saturation was reached. Thematic analysis followed a mixed deductive-inductive approach: deductive codes were derived from theoretical frameworks (Technology Acceptance Model, Health Belief Model, User-Centered Design, Behavioral Design), while inductive themes emerged from participants responses across the three research questions, including their evaluations of four screening questionnaire (PHQ-2, PHQ-9, EEK-2, WHO-5). Results: Key barriers included data privacy fears, distrust of digital solutions, lengthy questionnaires, and poor user experience (UX). Facilitators were anonymity, institutional trust, short (5-10 min) questionnaires, mobile-optimized design, personalized feedback, and clear next steps. As main contribution, four archetypes were identified: Skeptic, Self-Manager, Explorer, and Situational Seeker. They reflected distinct patterns across privacy concerns, institutional trust, user experience preferences, and help-seeking orientations. Skeptics were characterized by low institutional trust, high concern about data misuse, and a preference for anonymous, low-friction interactions, often delaying help-seeking. In contrast, Self-Managers emphasized autonomy, transparency, and evidence-based support, engaging in structured self-monitoring and purposeful help-seeking. Explorers showed openness to experimentation and engagement, particularly when supported by intuitive, interactive, and visually clear UX, while data sharing depended on perceived value. Situational Seekers demonstrated episodic engagement patterns, where trust, data-sharing, and help-seeking were highly context-dependent, preferring fast, low-effort interactions when needed. Conclusions: Mens uptake of DMH screening questionnaires is influenced by a combination of social, psychological, and usability factors. Effective design should integrate anonymity, institutional credibility, and user-centered features to support engagement and early mental health detection. Personalized, actionable feedback with transparency, user control, and clear next-step guidance emerged as key drivers of sustained engagement, while poor usability and lack of meaningful feedback led to disengagement. Importantly, the proposed archetypes capture how these factors co-occur in dynamic, context-dependent user profiles, offering a more actionable alternative to one-size-fits-all and demographic approaches for designing DMH questionnaires tailored to male users.

20

A longitudinal cohort study comparing clinical trials registered on ClinicalTrials.gov that stopped during the first three years of the SARS-CoV-2 pandemic with trials that stopped in the three years prior

Carlisle, B. G.; Hutchinson, N.; Moyer, H.

2026-05-22 public and global health 10.64898/2026.05.20.26353581 medRxiv

Top 0.6%

3.6%

Show abstract

Background: The global SARS-CoV-2 pandemic disrupted healthcare systems worldwide, raising concerns about its impact on clinical research. Early reports suggested reductions in participant enrollment, interruptions to ongoing trials, and challenges to protocol adherence, yet the magnitude and duration of these operational disruptions remain unclear. Methods: We conducted a registry-based analysis comparing clinical trials during the COVID-19 pandemic (December 2019 to November 2022) with a matched pre-pandemic cohort (December 2016 to November 2019). Studies were included if they reported any modifications to trial status, enrollment, or protocols during the study periods. Key variables included trial stoppage, enrollment changes, and adoption of remote or hybrid procedures. Results: The global SARS-CoV-2 pandemic resulted in widespread disruptions to trial operations with 13,323 clinical trials terminated, suspended or withdrawn over the course of the pandemic, a 38% increase compared to the 9,665 trials that stopped in the 3 years prior to the pandemic. Registries indicated a sharp decline in new participant enrollment across geographic regions and therapeutic areas, with partial recovery in later months. Review findings highlighted barriers including patient inaccessibility, staff redeployment, and supply chain interruptions. Conclusions: The pandemic caused system-wide operational shocks that compromised trial timelines and may have downstream methodological consequences. Recovery in enrollment does not imply restoration of pre-pandemic protocol fidelity or outcome ascertainment. Standardized reporting of disruptions, proactive contingency planning, and resilient trial designs are needed to maintain data integrity during large-scale disruptions and to support reliable evidence generation.